direction estimation
Vi-TacMan: Articulated Object Manipulation via Vision and Touch
Cui, Leiyao, Zhao, Zihang, Xie, Sirui, Zhang, Wenhuan, Han, Zhi, Zhu, Yixin
Autonomous manipulation of articulated objects remains a fundamental challenge for robots in human environments. Vision-based methods can infer hidden kinematics but can yield imprecise estimates on unfamiliar objects. Tactile approaches achieve robust control through contact feedback but require accurate initialization. This suggests a natural synergy: vision for global guidance, touch for local precision. Yet no framework systematically exploits this complementarity for generalized articulated manipulation. Here we present Vi-TacMan, which uses vision to propose grasps and coarse directions that seed a tactile controller for precise execution. By incorporating surface normals as geometric priors and modeling directions via von Mises-Fisher distributions, our approach achieves significant gains over baselines (all p<0.0001). Critically, manipulation succeeds without explicit kinematic models -- the tactile controller refines coarse visual estimates through real-time contact regulation. Tests on more than 50,000 simulated and diverse real-world objects confirm robust cross-category generalization. This work establishes that coarse visual cues suffice for reliable manipulation when coupled with tactile feedback, offering a scalable paradigm for autonomous systems in unstructured environments.
- Research Report > New Finding (0.34)
- Research Report > Experimental Study (0.34)
Illuminant and light direction estimation using Wasserstein distance method
Illumination estimation remains a pivotal challenge in image processing, particularly for robotics, where robust environmental perception is essential under varying lighting conditions. Traditional approaches, such as RGB histograms and GIST descriptors, often fail in complex scenarios due to their sensitivity to illumination changes. This study introduces a novel method utilizing the Wasserstein distance, rooted in optimal transport theory, to estimate illuminant and light direction in images. Experiments on diverse images indoor scenes, black-and-white photographs, and night images demonstrate the method's efficacy in detecting dominant light sources and estimating their directions, outperforming traditional statistical methods in complex lighting environments. The approach shows promise for applications in light source localization, image quality assessment, and object detection enhancement. Future research may explore adaptive thresholding and integrate gradient analysis to enhance accuracy, offering a scalable solution for real-world illumination challenges in robotics and beyond.
- North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
- Europe > Montenegro (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > Japan > Shikoku > Kagawa Prefecture > Takamatsu (0.04)
See Behind Walls in Real-time Using Aerial Drones and Augmented Reality
Yang, Sikai, Yang, Kang, Chen, Yuning, Zhao, Fan, Du, Wan
This work presents ARD2, a framework that enables real-time through-wall surveillance using two aerial drones and an augmented reality (AR) device. ARD2 consists of two main steps: target direction estimation and contour reconstruction. In the first stage, ARD2 leverages geometric relationships between the drones, the user, and the target to project the target's direction onto the user's AR display. In the second stage, images from the drones are synthesized to reconstruct the target's contour, allowing the user to visualize the target behind walls. Experimental results demonstrate the system's accuracy in both direction estimation and contour reconstruction.
- North America > United States > New York (0.04)
- North America > United States > California > Merced County > Merced (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
LNSMM: Eye Gaze Estimation With Local Network Share Multiview Multitask
Huang, Yong, Chen, Ben, Qu, Daiming
Eye gaze estimation has become increasingly significant in computer vision.In this paper,we systematically study the mainstream of eye gaze estimation methods,propose a novel methodology to estimate eye gaze points and eye gaze directions simultaneously.First,we construct a local sharing network for feature extraction of gaze points and gaze directions estimation,which can reduce network computational parameters and converge quickly;Second,we propose a Multiview Multitask Learning (MTL) framework,for gaze directions,a coplanar constraint is proposed for the left and right eyes,for gaze points,three views data input indirectly introduces eye position information,a cross-view pooling module is designed, propose joint loss which handle both gaze points and gaze directions estimation.Eventually,we collect a dataset to use of gaze points,which have three views to exist public dataset.The experiment show our method is state-of-the-art the current mainstream methods on two indicators of gaze points and gaze directions.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
Improved Generalization of Heading Direction Estimation for Aerial Filming Using Semi-supervised Regression
Wang, Wenshan, Ahuja, Aayush, Zhang, Yanfu, Bonatti, Rogerio, Scherer, Sebastian
In the task of Autonomous aerial filming of a moving actor (e.g. a person or a vehicle), it is crucial to have a good heading direction estimation for the actor from the visual input. However, the models obtained in other similar tasks, such as pedestrian collision risk analysis and human-robot interaction, are very difficult to generalize to the aerial filming task, because of the difference in data distributions. Towards improving generalization with less amount of labeled data, this paper presents a semi-supervised algorithm for heading direction estimation problem. We utilize temporal continuity as the unsupervised signal to regularize the model and achieve better generalization ability. This semi-supervised algorithm is applied to both training and testing phases, which increases the testing performance by a large margin. We show that by leveraging unlabeled sequences, the amount of labeled data required can be significantly reduced. We also discuss several important details on improving the performance by balancing labeled and unlabeled loss, and making good combinations. Experimental results show that our approach robustly outputs the heading direction for different types of actor. The aesthetic value of the video is also improved in the aerial filming task.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Asia > Japan > Honshū > Chūbu > Shizuoka Prefecture > Shizuoka (0.04)